Category 4 - Computers

The Wassenaar Arrangement - Dual-Use and Munitions Lists - July 1996

[58]

DUAL-USE

CATEGORY 4 - COMPUTERS

Notes 1 Computers, related equipment and "software" performing telecommunications or "local area network" functions must also be evaluated against the performance characteristics of Category 5, Part 1 (Telecommunications).

N.B. 1. Control units which directly interconnect the buses or channels of central processing units, "main storage" or disk controllers are not regarded as telecommunications equipment described in Category 5, Part 1 (Telecommunications).
N.B. 2. For the control status of "software" specially designed for packet switching, see Item 5.D.1. (Telecommunications).

Note 2 Computers, related equipment and "software" performing cryptographic, cryptanalytic, certifiable multi-level security or certifiable user isolation functions, or which limit electromagnetic compatibility (EMC), must also be evaluated against the performance characteristics in Category 5, Part 2 ("Information Security").

4.A. SYSTEMS, EQUIPMENT AND COMPONENTS

4.A.1. Electronic computers and related equipment, as follows, and "electronic assemblies" and specially designed components therefor:

a. Specially designed to have any of the following characteristics:
1. Rated for operation at an ambient temperature below 228 K (-45°C) or above 358 K (85°C);
Note: 4.A.1.a.1. does not apply to computers specially designed for civil automobile or railway train applications.
2. Radiation hardened to exceed any of the following specifications:
a. Total Dose 5 x 10³ Gy (Si);
b. Dose Rate Upset 5 x 10⁶ Gy (Si)/sec; or
c. Single Event Upset 1 x 10^-7 Error/bit/day;

b. Having characteristics or performing functions exceeding the limits in Category 5, Part 2 ("Information Security").

[59]

4.A.2. "Hybrid computers", as follows, and "electronic assemblies" and specially designed components therefor:

a. Containing "digital computers" specified in 4.A.3.;
b. Containing analogue-to-digital converters having all of the following characteristics:
1. 32 channels or more; and
2. A resolution of 14 bits (plus sign bit) or more with a conversion rate of 200,000 conversions/s or more.

4.A.3. "Digital computers", "electronic assemblies", and related equipment therefor, as follows, and specially designed components therefor:

Note 1 4.A.3. includes the following:
a. Vector processors;
b. Array processors;
c. Digital signal processors;
d. Logic processors;
e. Equipment designed for "image enhancement";
f. Equipment designed for "signal processing".

Note 2 The control status of the "digital computers" and related equipment described in 4.A.3. is determined by the control status of other equipment or systems provided:
a. The "digital computers" or related equipment are essential for the operation of the other equipment or systems;
b. The "digital computers" or related equipment are not a "principal element" of the other equipment or systems; and
N.B. 1 The control status of "signal processing" or "image enhancement" equipment specially designed for other equipment with functions limited to those required for the other equipment is determined by the control status of the other equipment even if it exceeds the "principal element" criterion.
N.B. 2 For the control status of "digital computers" or related equipment for telecommunications equipment, see Category 5, Part 1 (Telecommunications).

c. The "technology" for the "digital computers" and related equipment is determined by 4.E.

a. Designed or modified for "fault tolerance";
Note: For the purposes of 4.A.3.a., "digital computers" and related equipment are not considered to be designed or modified for "fault tolerance" if they utilise any of the following:
1. Error detection or correction algorithms in "main storage";
2. The interconnection of two "digital computers" so that, if the active central processing unit fails, an idling but mirroring central processing unit can continue the system's functioning;

[60]

3. The interconnection of two central processing units by data channels or by use of shared storage to permit one central processing unit to perform other work until the second central processing unit fails, at which time the first central processing unit takes over in order to continue the system's functioning; or
4. The synchronisation of two central processing units by "software" so that one central processing unit recognises when the other central processing unit fails and recovers tasks from the failing unit.

b. "Digital computers" having a "composite theoretical performance" ("CTP") exceeding 710 million theoretical operations per second (Mtops);
c. "Electronic assemblies" specially designed or modified to be capable of enhancing performance by aggregation of "computing elements" ("CEs") so that the "CTP" of the aggregation exceeds the limit in 4.A.3.b.;
Note 1 4.A.3.c. applies only to "electronic assemblies" and programmable interconnections not exceeding the limit in 4.A.3.b. when shipped as unintegrated "electronic assemblies". It does not apply to "electronic assemblies" inherently limited by nature of their design for use as related equipment specified in 4.A.3.d., 4.A.3.e. or 4.A.3.f.
Note 2 4.A.3.c. does not control "electronic assemblies" specially designed for a product or family of products whose maximum configuration does not exceed the limit of 4.A.3.b.

d. Graphics accelerators and graphics coprocessors exceeding a "three dimensional Vector Rate" of 3,000,000;
e. Equipment performing analogue-to-digital conversions exceeding the limits in 3.A.1.a.5.;
f. Equipment containing "terminal interface equipment" exceeding the limits in 5.A.1.b.3.;
Note For the purposes of 4.A.3.f., "terminal interface equipment" includes "local area network" interfaces and other communications interfaces. "Local area network" interfaces are evaluated as "network access controllers".

g. Equipment specially designed to provide external interconnection of "digital computers" or associated equipment which allows communications at data rates exceeding 80 Mbyte/s.
Note: 4.A.3.g. does not control internal interconnection equipment (e.g., backplanes, buses) or passive interconnection equipment.

4.A.4. Computers, as follows, and specially designed related equipment, "electronic assemblies" and components therefor:

a. "Systolic array computers";
b. "Neural computers";
c. "Optical computers".

[61]

4.B. TEST, INSPECTION AND PRODUCTION EQUIPMENT - None

4.C. MATERIALS - None.

4.D. SOFTWARE

Note The control status of "software" for the "development", "production", or "use" of equipment described in other Categories is dealt with in the appropriate Category. The control status of "software" for equipment described in this Category is dealt with herein.

4.D.1 "Software" specially designed or modified for the "development", "production" or "use" of equipment or "software" specified in 4.A. or 4.D.

4.D.2 "Software" specially designed or modified to support "technology" specified in 4.E.

4.D.3. Specific "software", as follows:

a. Operating system "software", "software" development tools and compilers specially designed for "multi-data-stream processing" equipment, in "source code";
b. "Expert systems" or "software" for "expert system" inference engines providing both:
1. Time dependent rules; and
2. Primitives to handle the time characteristics of the rules and the facts;

c. "Software" having characteristics or performing functions exceeding the limits in Category 5, Part 2 ("Information Security");
d. Operating systems specially designed for "real time processing" equipment which guarantees a "global interrupt latency time" of less than 20 µs.

4.E. TECHNOLOGY

4.E.1. "Technology" according to the General Technology Note, for the "development", "production" or "use" of equipment or "software" specified in 4.A. or 4.D.

[61]

TECHNICAL NOTE ON "COMPOSITE THEORETICAL PERFORMANCE" ("CTP")

Abbreviations used in this Technical Note

"CE" "CE" "computing element" (typically an arithmetic logical unit)

FP floating point

XP fixed point

t execution time

XOR exclusive OR

CPU central processing unit

TP theoretical performance (of a single "CE")

"CTP" "composite theoretical performance" (multiple "CEs")

R effective calculating rate

WL word length

L word length adjustment

* multiply

Execution time 't' is expressed in microseconds, TP and "CTP" are expressed in millions of theoretical operations per second (Mtops) and WL is expressed in bits.

Outline of "CTP" calculation method

"CTP" is a measure of computational performance given in Mtops. In calculating the "CTP" of an aggregation of "CEs" the following three steps are required:
1. Calculate the effective calculating rate R for each "CE";
2. Apply the word length adjustment (L) to the effective calculating rate (R), resulting in a Theoretical Performance (TP) for each "CE";
3. If there is more than one "CE", combine the TPs, resulting in a "CTP" for the aggregation.

Details for these steps are given in the following sections.
Note 1 For aggregations of multiple "CEs" which have both shared and unshared memory subsystems, the calculation of "CTP" is completed hierarchically, in two steps: first, aggregate the groups of "CEs" sharing memory; second, calculate the "CTP" of the groups using the calculation method for multiple "CEs" not sharing memory.
Note 2 "CEs" that are limited to input/output and peripheral functions (e.g., disk drive, communication and video display controllers) are not aggregated into the "CTP" calculation.

[63]

TECHNICAL NOTE ON "CTP"

The following table shows the method of calculating the Effective Calculating Rate R for each "CE":

Step 1: The effective calculating rate R

For "CEs" Implementing:
Note Every "CE" must be evaluated
independently.
Effective calculating Rate, R

XP only

(R _xp)

  1
3*(t _{xp add})
if no add is implemented use:
  1
(t _{xp mult})
If neither add nor mulitply is implemented use the fastest available arithmetic operation as follows:
  1
3 *t _xp
See Notes X & Z

FP only
(R _fp)

max
  1 ,
t _{fp add}

  1
t _{fp mult}

See Notes X & Y

Both FP and XP
(R)
Calculate both
R _xp, R _fp

For simple logic processors not implementing any of the specified arithmetic operations.

  1
3 *t _log
Where t log is the execute time of the XOR, or for logic hardware not implementing the XOR, the fastest simple logic operation.
See Notes X & Z

For special logic processors not using any of the specified arithmetic or logic operations.

R = R' * WL/64

Where R' is the number of results per second, WL is the number of bits upon which the logic operation occurs, and 64 is a factor to normalize to a 64 bit operation.

[64]

TECHNICAL NOTE ON "CTP"

Note W

For a pipelined "CE" capable of executing up to one arithmetic or logic operation every clock cycle after the pipeline is full, a pipelined rate can be established. The effective calculating rate (R) for such a "CE" is the faster of the pipelined rate or non-pipelined execution rate.

Note X

For a "CE" which performs multiple operations of a specific type in a single cycle (e.g., two additions per cycle or two identical logic operations per cycle), the execution time t is given by:

t =	cycle time the number of identical operations per machine cycle

"CEs" which perform different types of arithmetic or logic operations in a single machine cycle are to be treated as multiple separate "CEs" performing simultaneously (e.g., a "CE" performing an addition and a multiplication in one cycle is to be treated as two "CEs", the first performing an addition in one cycle and the second performing a multiplication in one cycle).
If a single "CE" has both scalar function and vector function, use the shorter execution time value.

Note Y

For the "CE" that does not implement FP add or FP multiply, but that performs FP divide:

R_fp =

1
t _fpdivide

If the "CE" implements FP reciprocal but not FP add, FP multiply or FP divide, then

R_fp =

1
t _fpreciprocal

If none of the specified instructions is implemented, the effective FP rate is 0.

Note Z

In simple logic operations, a single instruction performs a single logic manipulation of no more than two operands of given lengths.
In complex logic operations, a single instruction performs multiple logic manipulations to produce one or more results from two or more operands.

[65]

TECHNICAL NOTE ON "CTP"

Note Z

Rates should be calculated for all supported operand lengths considering both pipelined operations (if supported), and non-pipelined operations using the fastest executing instruction for each operand length based on:
1. Pipelined or register-to-register operations. Exclude extraordinarily short execution times generated for operations on a predetermined operand or operands (for example, multiplication by 0 or 1). If no register-to-register operations are implemented, continue with (2).
2. The faster of register-to-memory or memory-to-register operations; if these also do not exist, then continue with (3).
3. Memory-to-memory.

In each case above, use the shortest execution time certified by the manufacturer.

Step 2: TP for each supported operand length WL

Adjust the effective rate R (or R') by the word length adjustment L as follows:

TP = R * L,
where L = (1/3 + WL/96)

Note The word length WL used in these calculations is the operand length in bits. (If an operation uses operands of different lengths, select the largest word length.)
The combination of a mantissa ALU and an exponent ALU of a floating point processor or unit is considered to be one "CE" with a Word Length (WL) equal to the number of bits in the data representation (typically 32 or 64) for purposes of the "CTP" calculation.

This adjustment is not applied to specialized logic processors which do not use XOR instructions. In this case TP = R.

Select the maximum resulting value of TP for:
Each XP-only "CE" (R_xp);
Each FP-only "CE" (R_fp);
Each combined FP and XP "CE" (R);
Each simple logic processor not implementing any of the specified arithmetic operations; and
Each special logic processor not using any of the specified arithmetic or logic operations.

[66]

TECHNICAL NOTE ON "CTP"

Step 3: "CTP" for aggregations of "CEs", including CPUs

For a CPU with a single "CE",

"CTP" = TP

(for "CEs" performing both fixed and floating point operations

TP = max (TP_fp, TP_xp))

"CTP" for aggregations of multiple "CEs" operating simultaneously is calculated as follows:
Note 1 For aggregations that do not allow all of the "CEs" to run simultaneously, the possible combination of "CEs" that provides the largest "CTP" should be used. The TP of each contributing "CE" is to be calculated at its maximum value theoretically possible before the "CTP" of the combination is derived.
N.B. To determine the possible combinations of simultaneously operating "CEs", generate an instruction sequence that initiates operations in multiple "CEs", beginning with the slowest "CE" (the one needing the largest number of cycles to complete its operation) and ending with the fastest "CE". At each cycle of the sequence, the combination of "CEs" that are in operation during that cycle is a possible combination. The instruction sequence must take into account all hardware and/or architectural constraints on overlapping operations.

Note 2 A single integrated circuit chip or board assembly may contain multiple "CEs".
Note 3 Simultaneous operations are assumed to exist when the computer manufacturer claims concurrent, parallel or simultaneous operation or execution in a manual or brochure for the computer.

Note 4 "CTP" values are not to be aggregated for "CE" combinations (inter)connected by "Local Area Networks", Wide Area Networks, I/O shared connections/devices, I/O controllers and any communication interconnection implemented by software.

[67]

TECHNICAL NOTE ON "CTP"

Note 5 "CTP" values must be aggregated for multiple "CEs" specially designed to enhance performance by aggregation, operating simultaneously and sharing memory,- or multiple memory/"CE"- combinations operating simultaneously utilising specially designed hardware.
This aggregation does not apply to "electronic assemblies" described by 4.A.3.d.

"CTP" = TP₁ + C₂ * TP₂ + ... + C_n * TP _n,

where the TPs are ordered by value, with TP1 being the highest, TP2 being the second highest, ..., and TPn being the lowest. Ci is a coefficient determined by the strength of the interconnection between "CEs", as follows:

For multiple "CEs" operating simultaneously and sharing memory:

C₂ = C₃ = C₄ = ... = C_n = 0.75

Note 1 When the "CTP" calculated by the above method does not exceed 194 Mtops, the following formula may be used to calculate C_i:

C_i =
0.75
square root m

(i = 2, ... , n)

where m = the number of "CEs" or groups of "CEs" sharing access.

provided:
1. The TP_i of each "CE" or group of "CEs" does not exceed 30 Mtops;
2. The "CEs" or groups of "CEs" share access to main memory (excluding cache memory) over a single channel; and
3. Only one "CE" or group of "CEs" can have use of the channel at any given time.
N.B. This does not apply to items controlled under Category 3.

Note 2 "CEs" share memory if they access a common segment of solid state memory. This memory may include cache memory, main memory or other internal memory. Peripheral memory devices such as disk drives, tape drives or RAM disks are not included.

[68]

TECHNICAL NOTE ON "CTP"

For Multiple "CEs" or groups of "CEs" not sharing memory, interconnected by one or more data channels:

C_i = 0.75 * k_i (i = 2, ... , 32) (see Note below)

= 0.60 * k_i (i = 33, ... , 64)

= 0.45 * k_i (i = 65, ... , 256)

= 0.30 * k_i (i > 256)

The value of C_i is based on the number of "CE"s, not the number of nodes.

where

k_i = min (S_i/K_r, 1), and

K_r = normalizing factor of 20 MByte/s.

S_i = sum of the maximum data rates (in units of MByte/s) for all data channels connected to the i^th "CE" or group of "CEs" sharing memory.

When calculating a C_i for a group of "CEs", the number of the first "CE" in a group determines the proper limit for Ci. For example, in an aggregation of groups consisting of 3 "CEs" each, the 22nd group will contain "CE"₆₄, "CE"₆₅ and "CE"₆₆. The proper limit for C_i for this group is 0.60.

Aggregation (of "CEs" or groups of "CEs") should be from the fastest-to-slowest; i.e.:
TP₁ >/= TP₂ >/= .... TP_n , and

in the case of TP_i = TP_{i + 1}, from the largest to smallest; i.e.:
C_i >/= C_i + 1

Note The k_i factor is not to be applied to "CEs" 2 to 12 if the TP_i of the "CE" or group of "CEs" is more than 50 Mtops; i.e., C_i for "CEs" 2 to 12 is 0.75.

Table of Contents

Hypertext by JYA/Urban Deadline.

C_i	= 0.75 * k_i (i = 2, ... , 32) (see Note below)
	= 0.60 * k_i (i = 33, ... , 64)
	= 0.45 * k_i (i = 65, ... , 256)
	= 0.30 * k_i (i > 256)

k_i	= min (S_i/K_r, 1), and
K_r	= normalizing factor of 20 MByte/s.
S_i	= sum of the maximum data rates (in units of MByte/s) for all data channels connected to the i^th "CE" or group of "CEs" sharing memory.